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Abstract 



It is known that if a 2-universal hash function H is applied to elements of a block source 
{Xi, . . . ,Xt), where each item Xi has enough min-entropy conditioned on the previous 
items, then the output distribution {H,H{Xi), . . . ,H{Xt)) will be "close" to the uniform 
distribution. We provide improved bounds on how much min-entropy per item is required 
for this to hold, both when we ask that the output be close to uniform in statistical distance 
and when we only ask that it be statistically close to a distribution with small collision 
probability. In both cases, we reduce the dependence of the min-entropy on the number T 
of items from 2 log T in previous work to log T, which we show to be optimal. This leads 
to corresponding improvements to the recent results of Mitzenmacher and Vadhan (SODA 
'08) on the analysis of hashing-based algorithms and data structures when the data items 
come from a block source. 

1 Introduction 

A block source is a sequence of items X = {Xi, . . . , Xt) in which each item has at least some 
k bits of "entropy" conditioned on the previous ones |CG88] . Previous works |CG881 IZuc96[ 
IMV08j have analyzed what happens when one applies a 2-universal hash function to each item 
in such a sequence, establishing results of the following form: 

Block-Source Hashing Theorems (informal): If {Xi, . . . is a block source 
with k bits of "entropy" per item and H is a random hash function from a 2- 
universal family mapping to m <^ k bits, then {H(Xi), . . . , H{Xt)) is "close" to 
the uniform distribution. 

*An extended abstract of this paper will appear in RANDOM '08 [CV08] . 

tWork done when visiting U.C. Berkeley, supported by US-Israel BSF grant 2002246 and NSF grant CNS- 
0430336. 

*Work done when visiting U.C. Berkeley, supported by the Miller Institute for Basic Research in Science, a 
Guggenheim Fellowship, US-Israel BSF grant 2006060, and ONR grant N00014-04- 1-0478. 



1 



In this paper, we prove new results of this form, achieving improved (in some cases, optimal) 
bounds on how much entropy k per item is needed to ensure that the output is close to uniform, 
as a function of the other parameters (the output length m of the hash functions, the number 
T of items, and the "distance" from the uniform distribution). But first we discuss the two 
applications that have motivated the study of Block-Source Hashing Theorems. 

1.1 Applications of Block-Source Hashing 

Randomness Extractors. A randomness extractor is an algorithm that extracts almost- 
uniform bits from a source of biased and correlated bits, using a short seed of truly random 
bits as a catalyst |NZ96| . Extractors have many applications in theoretical computer science 
and have played a central role in the theory of pseudorandomness. (See the surveys |NT99t 
ISha04[ IVadOTj .) Block-source Hashing Theorems immediately yield methods for extracting 
randomness from block sources, where the seed is used to specify a universal hash function. 
The gain over hashing the entire T-tuple at once is that the blocks may be much shorter than 
the entire sequence, and thus a much shorter seed is required to specify the universal hash 
function. Moreover, many subsequent constructions of extractors for general sources (without 
the block structure) work by first converting the source into a block source and performing 
block-source hashing. 

Analysis of Hashing-Based Algorithms. The idea of hashing has been widely applied in 
designing algorithms and data structures, including hash tables |Knu98] . Bloom filters [BM03] . 
summary algorithms for data streams |Mut03j . etc. Given a stream of data items (xi, . . . , xt), 
we first hash the items into 

{H{xi), . . . , H{xt)), and carry out a computation using the hashed values. In the literature, the 
analysis of a hashing algorithm is typically a worst-case analysis on the input data items, and the 
best results are often obtained by unrealistically modelling the hash function as a truly random 
function mapping the items to uniform and independent m-bit strings. On the other hand, 
for realistic, efficiently computable hash functions (eg., 2-universal or 0(l)-wise independent 
hash functions), the provable performance is sometimes significantly worse. However, such 
gaps seem to not show up in practice, and even standard 2-universal hash functions empirically 
seem to match the performance of truly random hash functions. To explain this phenomenon, 
Mitzenmacher and Vadhan |MV08] have suggested that the discrepancy is due to worst-case 
analysis, and propose to instead model the input items as coming from a block source. Then 
Block-Source Hashing Theorems imply that the performance of universal hash functions is close 
to that of truly random hash functions, provided that each item has enough bits of entropy. 
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1.2 How Much Entropy is Required? 

A natural question about Block-Source Hashing Theorems is: how large does the "entropy" 
k per item need to be to ensure a certain amount of "closeness" to uniform (where both the 
entropy and closeness can be measured in various ways). This also has practical significance 
for the latter motivation regarding hashing-based algorithms, as it corresponds to the amount 
of entropy we need to assume in data items. In [MV08j . they provide bounds on the entropy 
required for two measures of closeness, and use these as basic tools to bound the required 
entropy in various applications. The requirement is usually some small constant multiple of 
log T, where T is the number of items in the source, which can be on the borderline between a 
reasonable and unreasonable assumption about real-life data. Therefore, it is interesting to pin 
down the optimal answers to these questions. In what follows, we first summarize the previous 
results, and then discuss our improved analysis and corresponding lower bounds. 

A standard way to measure the distance of the output from the uniform distribution is 
by statistical distance^ In the randomness extractor literature, classic results |CG88l lILLBOj 
IZuc96] show that using 2-universal hash functions, k = m + 21og(T/e) + 0(1) bits of min- 
entropy (or even Renyi entropy jl per item is sufficient for the output distribution to be e-close 
to uniform in statistical distance. Sometimes a less stringent closeness requirement is sufficient, 
where we only require that the output distribution is e-close to a distribution having "small" 
collision probabilit'^ A result of |MV08] shows that k = m + 2 logT + log(l/e) +0(1) suffices 
to achieve this requirement. Using 4-wise independent hash functions, |MV08| further reduce 
the required entropy to A: = max{m + logT, l/2(m + 31ogT + log(l/e))} + 0(1). 

Our Results. We reduce the entropy required in the previous results, as summarized in Ta- 
ble [TJ Roughly speaking, we save an additive log T bits of min-entropy (or Renyi entropy) for 
all cases. We show that using universal hash functions, k = m + logT + 21og l/e + 0(l) bits per 
item is sufficient for the output to be e-close to uniform, and k = m + log(T/e) + 0(1) is enough 
for the output to be e-close to having small collision probability. Using 4-wise independent hash 
functions, the entropy k further reduces to max{m -|- log T, l/2(m + 2 log T + log 1/e)} + 0(1). 
The results hold even if we consider the joint distribution (H, H{Xi), . . . , H{X't)) (correspond- 
ing to "strong extractors" in the literature on randomness extractors). Substituting our im- 
proved bounds in the analysis of hashing-based algorithms from |MV08j . we obtain similar 
reductions in the min-entropy required for every application with 2-universal hashing. With 
4-wise independent hashing, we obtain a slight improvement for Linear Probing, and for the 

^The statistical distance of two random variables X and Y is A{X,Y) — maxT | Pr[X G T] — Pt[Y G T]|, 
where T ranges over all possible events. 

^The min-entropy of a random variable X is }iao{X) — miux log(l/ Pr[X = x]). All of the results mentioned 
actually hold for the less stringent measure of Renyi entropy H2(X) = log(l/ E^^x [Pr[X = a;]]). 

^The collision probability of a random variable X is ^^Pt:[X = x]^. By "small collision probability," we 
mean that the collision probability is within a constant factor of the collision probability of uniform distribution. 
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Setting 


Previous Results 


Our Results 


2-universal hashing 
e-close to uniform 


m-h21ogr-h21og(l/e) 
ICG881 IILL891 IZuc96j 


m + \ogT + 2\og{l/e) 


2-universal hashing 
e-close to small cp. 


ra + 2\ogT + \og{l/e) |MV08| 


m + \ogT + \og{l/e) 


4- wise indep. hashing 
e-close to small cp. 


max{m -|- log T, 

l/2(m + 31ogr + logl/e)} |MV08| 


max{m -|- log T, 

l/2(m + 21ogr + log(l/e)} 



Table 1: Our Results: Each entry denotes the min-entropy (actually, Renyi entropy) required 
per item when hashing a block source of T items to m-bit strings to ensure that the output 
has statistical distance at most e from uniform (or from having collision probability within a 
constant factor of uniform). Additive constants are omitted for readability. 

other applications, we show that the previous bounds can already be achieved with 2-universal 
hashing. The results are summarized in Table [2l 

Although the logT improvement seems small, we remark that it could be significant for 
practical settings of parameter. For example, suppose we want to hash 64 thousand internet 
traffic fiows, so logT « 16. Each fiow is specified by the 32-bit IP addresses and 16-bit port 
numbers for the source and destination plus the 8-bit transport protocol, for a total of 104 
bits. There is a noticeable difference between assuming that each flow contains 3 logT ~ 48 
vs. 4 logT ~ 64 bits of entropy as they are only 104 bits long, and are very structured. 

We also prove corresponding lower bounds showing that our upper bounds are almost 
tight. Specifically, we show that when the data items have not enough entropy, then the joint 
distribution (H,H{Xi), . . . ,H{Xt)) can be "far" from uniform. More precisely, we show that 
if A; = m -|- logT -|- 2 log 1/e — 0(1), then there exists a block source (Xi, . . . ,Xt) with k 
bits of min-entropy per item such that the distribution {H, H{Xi), . . . , H{Xt)) is e-far from 
uniform in statistical distance (for H coming from any hash family). This matches our upper 
bound up to an additive constant. Similarly, we show that if A; = m + logT — 0(1), then 
there exists a block source (^i, . . . ,Xt) with k bits of min-entropy per item such that the 
distribution (H, H{Xi), . . . ,H(Xt)) is 0.99-far from having small collision probability (for H 
coming from any hash family). This matches our upper bound up to an additive constant 
in case the statistical distance parameter e is constant; we also exhibit a specific 2-universal 
family for which the log(l/e) in our upper bound is nearly tight — it cannot be reduced below 
log(l/e) — log log(l/e). Finally, we also extend all of our lower bounds to the case that we only 
consider distribution of hashed values {H(Xi), . . . , H{Xt)), rather than their joint distribution 
with Y . For this case, the lower bounds are necessarily reduced by a term that depends on the 
size of the hash family. (For standard constructions of universal hash functions, this amounts 
to logn bits of entropy, where n is the bit-length of an individual item.) 
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Type of Hash Family 


Previous Results [MV08J 


Our Results 


Linear Probing 


2-universal hashing 
4-wise independence 


41ogr 
2.5 log T 


SlogT 
21ogT 


Balanced Allocations with d Choices 


2-universal hashing 
4-wise independence 


(d + 2)logr 
(d+l)logr 


(a! + l)logT 


Bloom Filters 


2-universal hashing 
4-wise independence 


4iogr 
3iogr 


SlogT 



Table 2: Applications: Each entry denotes the min-entropy (actually, Renyi entropy) required 
per item to ensure that the performance of the given application is "close" to the performance 
when using truly random hash functions. In all cases, the bounds omit additive terms that 
depend on how close a performance is desired, and we restrict to the (standard) case that the 
size of the hash table is linear in the number of items being hashed. That is, m = log T + 0(1)- 

Techniques. At a high level, all of the previous analyses for hashing block sources were loose 
due to summing error probabilities over the T blocks. Our improvements come from avoiding 
this linear blow-up by choosing more refined measures of error. For example, when we want the 
output to have small statistical distance from uniform, the classic Leftover Hash Lemma |ILL89] 
says that min-entropy k = m + 21og(l/£o) suffices for a single hashed block to be eo-close to 
uniform, and then a "hybrid argument" implies that the joint distribution of T hashed blocks 
is Teo-close to uniform |Zuc96] . Setting sq = e/T, this leads to a min-entropy requirement of 
k = m + 21og(l/e) -|- 21ogT per block. We obtain a better bound, reducing 21ogT to logT, 
by using Hellinger distance to analyze the error accumulation over blocks, and only passing to 
statistical distance at the end. 

For the case where we only want the output to be close to having small collision probability, 
the previous analysis of [MV08| worked by first showing that the expected collision probability 
of each hashed block h{Xi) is "small" even conditioned on previous blocks, then using Markov's 
Inequality to deduce that each hashed block has small collision probability except with some 
probability eq, and finally doing a union bound to deduce that all hashed blocks have small 
collision probability except with probability Teq. We avoid the union bound by working with 
more refined notions of "conditional collision probability," which enable us to apply Markov's 
Inequality on the entire sequence rather than on each block individually. 

The starting point for our negative results is the tight lower bound for randomness extractors 
due to Radhakrishnan and Ta-Shma [RTOOj . Their methods show that if the min-entropy 
parameter k is not large enough, then for any hash family, there exists a (single-block) source 
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X such that h{X) is "far" from uniform (in statistical distance) for "many" hash functions 
h. We then take our block source {Xi, . . . ,Xt) to consist of T iid copies of X, and argue 
that the statistical distance from uniform grows sufficiently fast with the number T of copies 
taken. For example, we show that if two distributions have statistical distance e, then their 
T-fold products have statistical distance r2(min{l, VT ■ e}), strengthening a previous bound of 
Reyzin |Rey04] , who proved a bound of ^}{mm{e^^^ , Vt ■ e}). 

2 Preliminaries 

Notations. All logs are based 2. We use the convention that N = 2", K = 2^, and M = 2™. 
We think of a data item X as a random variable over [A^] = {1, . . . , N}, which can be viewed 
as the set of n-bit strings. A hash function h : [N] — > [M] hashes an item to a m-bit string. A 
hash function family TC is a multiset of hash functions, and H will usually denote a uniformly 
random hash function drawn from Ti.. U[m] denotes the uniform distribution over [M]. Let 
X = (Xi, . . . ,Xt) be a sequence of data items. We use X^i to denote the first i — 1 items 
{Xi, . . . , We refer to Xi as an item or a block interchangeably. Our goal is to study the 

distribution of hashed sequence {H, Y) = {H, Yi,..., Yt) =^ {H, H{Xi), . . . , H{Xt)). 

Hash Families. The truly random hash family 7i is the set of all functions from [N] to [M]. 
A hash family TC is s-universal if for every sequence of distinct elements xi,...,Xs G [A^], 
Vih\H{xi) = • • • = H{xs)] < 1/M'^. 7i is s-wise independent if for every sequence of distinct 
elements xi, . . . ,Xs G [A^], H(xi), . . . ,H{xs) are independent and uniform random variables 
over [M]. 

Block Sources and Collision Probability. For a random variable A", the collision proba- 
bility of X is cp(A) = Pr[X = X'\ = Pr[A = x]'^, where X' is an independent copy of X. 
The Renyi entropy H2(A) = log(l/cp(A)) can be viewed as a measure of the amount of ran- 
domness in X (In the randomness extractor literature, the entropy is measured by min- entropy 
Hoc{X) = min^gsupp(x) log(l/Pr[X = x]), but using the less stringent measure Renyi entropy 
makes our results stronger since H2{X) > Hoo{X).) For an event E, {X\e) is the random 
variable defined by conditioning X on E. 

Definition 2.1 (Block Sources) A sequence of random variables {Xi, . . . ,Xt) over [N]"^ is 
a block AT-source if for every i £ [T], and every x^i in the support of X^i, we have cp(Aj|A<j = 
x<i) < 1/K. That is, each item X-i has at least k = log AT bits of Renyi entropy even after 
conditioning on the previous items. 

Let X = {Xi, . . . , Xt) be a sequence of random variables over [M]^. We are interested 
in bounding the overall collision probability cp(X) by the collision probability of each blocks. 
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Suppose all Xj's are independent, then cp(X) = H^i cp(^i)- The following lemma generalizes 
Lemma 4.2 in |MV08j . which says that if for every x G X, the average collision probability of 
every block Xi conditioning on X<j = x<j is small, then the overall collision probability cp(X) 
is also small. In particular, if X is a block if -source, then cp(X) < . 

Lemma 2.2 Let X = (Xi, . . . , Xt) he a sequence of random variables such that for every 
X G supp(X), 



T 

T 

Then the overall collision probability satisfies cp(X) < . 
Proof. By Arithmetic Mean-Geometric Mean Inequality, the inequality in the premise implies 



;7;X]cp(Xi|x<,=x<J < a. 

i=l 



i=l 

Therefore, it suffices to prove 

T 



cp(X) < max ]Tcp(Xi|x<,=x< 

2=1 



We prove it by induction on T. The base case T = 1 is trivial. Suppose the lemma is true for 
T - 1. We have 



cp(X) = J]Pr[Xi = xi]2.cp(X2,...,Xt|xi=xi) 

Xl 

{X2, . . . , Xt\ 
< cp(Xi)-max max TT cp(Xi|x<,=a;<J 

XI \ X2,...,XT / 
\ 1=2 / 



maxJJcp(Xi|x<i=x<J, 



X 

1=1 



as desired. 
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Statistical Distance. The statistical distance is a standard way to measure the distance of 

two distributions. Let X and Y be two random variables. The statistical distance of X and Y 
is A(X,y) = maxT | Pr[X £ T] - Pr[y G T]| = (1/2) • I ^'^[^ = x\- Pr[y = a;]|, where T 
ranges over all possible events. When A(X, Y) < e, we say that X is e-close to Y. Similarly, if 
A(X, Y) > e, then X is £-/ar from y. The following standard lemma says that if X has small 
collision probability, then X is close to uniform in statistical distance. 

Lemma 2.3 Let X be a random variable over [M] such that cp{X) < (1 + e)/M. Then 
A(X,i7[M])< 

Conditional Collision Probability. Let (X, Y) be jointly distributed random variables. 
We can define the conditional Renyi entropy of X conditioning on Y as follows. 

Definition 2.4 The conditional collision probability of X conditioning on Y is cp{X\Y) = 
Ej/^y[cp(X|y=2y)]. The conditional Renyi entropy is Y{.2{X\Y) = log l/cp(X|y). 

The following lemma says that as in the case of Shannon entropy, conditioning can only 
decrease the entropy. 

Lemma 2.5 Let (X, F, Z) be jointly distributed random variables. We have cp{X) < cp{X\Y) < 
cp{X\Y,Z). 

Proof. For the first inequality, we have 
cp(X) = J2^v[X = xf 

X 

y,y' \ X J 

< Pr[y = y] ■ Pr[y = y'] ■ (y1 P'-I^ = ^1^ = yA ■ (y1 ^^[^ = = y'A 

y,y' \ X J \ X J 

= E [cp(X|y = y)V2 

< cp(x|y) 
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For the second inequality, observe that for every y in the support of Y, we have cp(X|y=y) < 
cp{{X\Y=y)\{Z\Y=y)) from the first inequahty. It follows that 

cp{X\Y) = E [cpiX\Y=y)] 

y^Y 

< E [cp{{X\Y=y)\{Z\Y=y))] 
y^Y 

= E [ E [cp{X\Y=y,Z=z)] 
y^Y z^(Z\Y=y) 

= cp(x|y,z) 



3 Positive Results: How Much Entropy is Sufficient? 

In this section, we present our positive results, showing that the distribution of hashed sequence 
{H, Y) = [H, H{Xi), . . . , H{Xt)) is close to uniform when if is a random hash function from 
a 2-universal hash family, and X = (Xi, . . . has sufficient entropy per block. The new 

contribution is that we will not need K = 2^ io be as large as in previous works, and so save 
the required randomness in the block source X = (Xi, . . . , Xt)- 



3.1 Small Collision Probability Using 2-universal Hash Functions 

Let H : \N] [M] be a random hash function from a 2-universal family 7i. We first study 
the conditions under which (ii, Y) = (H, H{Xi), . . . , H{Xt)) is e-close to having collision 
probability 0{l/{\Ti\ ■ M'^)). This requirement is less stringent than (if, Y) being e-close 
to uniform in statistical distance, and so requires less bits of entropy. Mitzenmacher and 
Vadhan |MV08j show that this guarantee suffices for some hashing applications. They show 
that K > MT'^/e is enough to satisfy the requirement. We save a factor of T, and show that 
in fact, K > MT/e, is sufficient. (Taking logs yields the first entry in Tabled! i.e. it suffices 
to have Renyi entropy k = m + logT + log(l/e) per block.) Formally, we prove the following 
theorem. 

Theorem 3.1 Let H : [N] [M] be a random hash function from a 2-universal family TC. 
Let X = {Xi, . . . ,Xt) be a block K-source over [N^' . For every e > 0, the hashed sequence 
{H,Y) = 

{H, H{Xi), . . . , H{Xt)) is e-close to a distribution {H, Z) = (H, Zi, . . . , Zt) such that 

In particular, if K > MT/e, then {H, Z) has collision probability at most {1 + 2MT / Ke) / {\H\ ■ 
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To analyze the distribution of the hashed sequence {H, Y), the starting point is the fohowing 
version of the Leftover Hash Lemma [BBR851 IILL89j , which says that when we hash a random 
variable X with enough entropy using a 2-universal hash function H, the conditional collision 
probability of H{X) conditioning on H is small. 

Lemma 3.2 (The Leftover Hash Lemma) Let H : [A'"] — > [M] he a random hash function 
from a 2-universal family 7i. Let X be a random variable over [N] with cp{X) < 1/K . We 
have cp{H{X)\H) <l/M + l/K. 

We now sketch how the hashed block source Y = (Yi, . . . , Yt) = {H{Xi), . . . , H{Xt)) is 
analyzed in [MV08j . and how we improve the analysis. The following natural approach is taken 
in |MV08j . Since the data X is a block iiT-source, the Leftover Hash Lemma tells us that for 
every block i € [T], if we condition on the previous blocks X<j = x^i, then the hashed value 
{Yi\x^i=x^i) has small conditional collision probability, i.e. cp{{Yi\x^i=x^i)\H) < 1/M + 1/K. 
This is equivalent to saying that the average collision probability of (Yi\x^i=x^i) over the choice 
of the hash function H is small, i.e., 

^E^[cp(/i(X,)U<,=x<J] = cp((F,U<,=.<J|i/) <^ + ^- 

We can then use a Markov argument to say that for every block, with probability at least 
1 — e/T over h <^ H, the collision probability is at most 1/M + T/{Ke). We can then take a 
union bound to say that for every x G supp(X), at least (1 — e)-fraction of hash functions h 
are good in the sense that cp(/i(Xj)|x<i=x<J is small for all blocks i = 1, . . . , T. jMV08 ] shows 
that if this condition is true for every (/i,x) E supp(-fr, X), then Y is a block {\/M + T / {Ke))- 
source, and thus the overall collision probability is at most (1 + MT jKej^ jM^ . |MV08j also 
shows how to modify an e-fraction of the distribution to fix the bad hash functions, and thus 
complete the analysis. 

The problem of the above analysis is that taking a Markov argument for each block, and 
then taking a union bound incurs a loss of factor T. To avoid this, we want to apply Markov 
argument only once to the whole sequence. For example, a natural thing to try is to sum over 
blocks to get 



E 



T " 1 ^ 11 

-^cp(/i(Xi)ix<.=x<j = -Y.^pm\x^.=x^M) ^M + ii 



i=l 



i=l 



and use a Markov argument to deduce that for every x G supp(X), with probability 1 — e over 
/i <— ii", the average collision probability per block satisfies 

- ■ cp(/i(Xi)|x<,=x<J ^M + Y~e- 
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We need to bound the collision probability of Y using this information. We may want to 
apply Lemma 12.21 but it requires the information on (1/^) cp(li,|y^.=y^.) instead of 
(1/T) Yli=i '^Pif^{-^i)\x^^=x^,)- That is, Lemma [22] requires us to condition on previous hashed 
values Y^i, whereas the above argument refers to conditioning on the un-hashed values X<j. 
The difficulty with directly reasoning about the former is that conditioned on the hashed values 
y<j, the hash function H may no longer be uniform (as it is correlated with Y<j) and thus the 
Leftover Hash Lemma no longer applies. 

To get around with the issues, we work with the averaged form of conditional collision 
probability cp(Yi\H,Y^i), as from Definition 12. 4[ Our key observation is that now we can 
apply Lemma 12.51 to deduce that for every block i £ [T], the conditional collision probability 
satisfies cp{Yi\H,Y^i) < cp{Yi\H, X^i) < 1/M + 1/K. Then, by a Markov argument, it follows 
that with probability 1 — e over (/i,y) <— Y), the average collision probability satisfies 

T 

1 1 1 

f Z^myi\iH,Y^,)={h,y^,)) ^M+J^- 
i=l 

We can then modify an e-fraction of distribution, and apply Lemma r2.2l to complete the analysis. 

The following lemma formalizes our claim about that the conditional collision probability 
of every block of Y) is small. 

Lemma 3.3 Let H : [A^] [M] he a random hash function from a 2-universal family TC. Let 
X = (Xi, . . . , Xt) be a block K -source over [Nf. Let {H, Y) = {H, H{Xi), H{Xt)). Then 
cp{H) = l/\n\ and for every i G [T\, cp{Yi\H ,Y<i) < l/M + l/K. 

Proof. cp(i/) = 1/1 is trivial since is the uniform distribution. Fix i G [T]. By the 
definition of block K-source, for every x<i in the support of X<j, cp(Xj|x<i=x<i) < 1/K. By 
the Leftover Hash Lemma, we have cp{{Yi\x^i=x^i)\{H\x^i=x^i)) < l/M + 1/K for every x<^i. 
It follows that cp{Yi\H,X^i) < l/M + 1/K. Now, we can think of {Yi\H,X<i) as Yi first 
conditioning on (H,Y^i), and then further conditioning on X<j. By Lemma l2.5( we have 

cp{Yi\H,Y^.^ < cp{Yi\H,Y^i,X^i) = cp{Yi\H,X^i) < l/M + 1/K, 

as desired. ■ 
We use this to prove Theorem 13.11 as outlined above. 



Proof of Theorem I3.lt By Lemma 13.31 for every i £ [T], we have 

E [cp{Yi\(^H,Y<^)=ih,y<.))] = cp{Y,\H,Y^i) < i- + 1. 
By linearity of expectation, the average conditional collision probability is also small. 

1 ^ 

7j;^Cp{Yi\^H,Y^,)={h,y<,)) 



E 

{h,y)^iH,Y) 



1 1 
< \ . 

- M K 
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Note that the cohision probabihty of a random variable over [M] is at least 1/M. Thus, 
Markov's inequality implies that with probability at least 1 — e over {h,y) ^ Y), 



1 ^ 1 1 1 f M\ 

i=l ^ ' 



In Lemma 13.41 below, we show how to fix the had {h,y)^s by modifying at most e- fraction 
of the distribution. Formally, Lemma 13.41 says that there exists a distribution (H, Z) = 
{H, Zi, . . . , Zt) such that {H, Y) is e-close to {H, Z), and for every (h, z) ^ {H, Z), 



T 

7^ Cp{Zi\^H,Z^,)=(h,z^,)) < V7 • ( 1 + 



i=l ^ ^ 



Applying Lemma [2^2] on (Z|/f=ft) for every h G supp('H), we have 



Lemma 3.4 Let (ff, Y) = {H,Yi, . . . jYr) be jointly distributed random variables over 7i x 
[Mf such that with probability at least 1 — e over (/i, y) <— {H,Y), the average conditional 
collision probability satisfies 

1 ^ 1 

^ • Yl '^Pi^i\(H,Y<.)=(h,y<.)) ^M+"- 
1=1 

Then there exists a distribution {H, Z) = {H, Zi, . . . , Zt) such that {H, Z) is e-close to {H, Y), 
and for every {h,z) G supp(i?, Z), we have 

1 ^ 1 

i=l 

Furthermore, the marginal distribution of H is unchanged. 

Proof. We define the distribution (H, Z) as follows. 
• Sample (/i,y) ^ (^,Y). 

. If (l/T) • Y^J^^ cp{Y,\^H^Y^^)^ih,y^,)) < ^/M + a, then output {h,y). 
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• Otherwise, let j G [T] be the least index such that 




• Choose Wj+i, . . . , wt <— U[m], and output {h,yi, . . . , yj,Wj^i, . . . , wt)- 

It is easy to check that (i) {H, Z) is well-defined, (ii) {H,Y) is e-close to {H, Z), (iii) for every 
(/i, z) e iH,Z), {l/T)-Yy[^^cp{Zi\i^H,z^,)=ih,z<,)) < l/M+a, and (iv) the marginal distribution 
of H is unchanged. ■ 

3.2 Small Collision Probability Using 4-wise Independent Hash Functions 

As discussed in |MV08|, using 4-wise independent hash functions H : [N] [M] from TC, we 
can further reduce the required randomness in the data X = {Xi, . . . , Xt)- [MV08] shows that 
in this case, K > MT + y^2MT^ /e is enough for the hashed sequence {H, Y) to be e-close 
to having collision probability 0(1/17^1 • M^). As discussed in the previous subsection, by 
avoiding using union bounds, we show that K > MT + ^2MT'^/e suffices. (Taking logs yields 
the second entry in Table [H i.e. it suffices to have Renyi entropy k = max{m + logT, (1/2) • 
(m + 21ogT + log(l/e))} + 0(1) per block.) Formally, we prove the following theorem. 

Theorem 3.5 Let H : [N] [M] be a random hash function from aA-wise independent family 
Ti. Let X = (Xi, . . . , Xt) be a block K-source over [N^ ■ For every e > 0, the hashed sequence 
{H,Y) = {H, H{Xi), . . . , H{Xt)) is e-close to a distribution {H,Z) = {H, Zi, . . . , Zt) such 
that 

I ( M I2m\ 

In particular, if K > MT + ^J2MT'^ je, then (H, Z) has collision probability at most (1 + 
7)/(|W| • M^) /or 7 = 2 • {MT + y/2MT^/£)/K. 

The improvement of Theorem 13.51 over Theorem 13.11 comes from that when we use 4-wise in- 
dependent hash families, we have a concentration result on the conditional collision probability 
for each block , via the following lemma. 

Lemma 3.6 ( [MV08] ) Let H : [N] [M] be a random hash function from a A-wise indepen- 
dent family TC, and X a random variable over [N] with cp(X) < 1/K . Then we have 

Vax|cp('.(X))l < J^. 
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We can then replace the appHcation of Markov's Inequahty in the proof of Theorem 13.11 by 
Chebychev's Inequahty to get stronger result. Formally, we prove the following lemma, which 
suffices to prove Theorem 13.51 



Lemma 3.7 Let H : [N] [M] be a random hash function from a A-wise independent family 
n. LetX= {Xi,...,Xt) be a block K -source over [Nf. Let{H,Y) = {H,H{Xi), . . . ,H{Xt)). 
Then with probability at least 1 — e over {h,y) <— Y), 



1 ^ if 

i=l \ 



M 2M 
1 + ^ + 



K y 



Theorem 13.51 follows immediately by composing Lemma 13. 7^ 13.41 and 12.21 in the same way 
as the proof of Theorem 13. 1[ 



Proof of Lemma 13. 7t Recall that we have 

T 



E 

(fc,y)^(/f,Y) 



1 1 
< \ . 

- M K 



{H,Y<,)={h,y<i))\ 

i=l 

Hence, our goal is to upper bound the probability of the value (l/T") Ylii=i cp(^j|(-f/,y<i)={/i,y<i)) 
deviating from its mean by ^j2/MK'^e. Our strategy is to bound the variance of a properly de- 
fined random variable, and then apply Chebychev's Inequality. By Lemma [3. 61 the information 
we get from 4- wise independent hash function is that for every i G [T], we have 

2 

Va^ [cp(yi|(H,x<,)={M<.))J ^ ^^<i ^ supp(X<,) (2) 

Fix i G [T], let us try to bound the variance of the i-th block. There are two issues to take 
care of. Firstly, the variance we have is conditioning on X<j instead of Y^i. Secondly, even 
when conditioning on X<j, it is possible that the variance is 



The reason is that conditioning on different X^i — ^<2, the collision probability of i^i\x^i=x^i) 
may have different expectation over /i <— Thus, we have to subtract the mean first. Let us 
define 

f{Kx<i) =Cv{yi\{H,X^,)={h,x^,)) - E [cv{yi\{H,X^,)={h,x^,))\ 

h< — H 

Now, for every x<j S supp(X<j), f{H,x^i) has mean 0, and variance < 2/MK'^. It follows 
that 

2 

Var [f(/i, x<i)l < t;. 
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We now deal with the issue of conditioning on X<j versus Y<j. Let us define 



We claim that 



9{h,y<i)= E [/(/i,x<i)]. 

X<i^{X<i\{H,Y^i)^{h,y^i)) 



M K 

Indeed, by Lemma 12.51 and the definition of / and 5, 

cp(^»l{H,y<.)=(fc,j/<,)) 
< cp((yi I (//,y<,)=(h,y<,)) I {Xi I (//,y<,)={/i,y<«) ) ) 

E [cp(li|{H,x<,)=(M<o)] 



E 

^<i^{X<i\{H,Y^^)^(h,y^i)) 



f{h,x<:i)+ E [cp(li|(H,x<,)=(/i,x<,)) 
/i<— 



Also note that g{H, Y<j) has mean and small variance: 

E [9{h,y<i)]= E [/(/i,x<,)] =0, 

(fe,y<,)^(H,y<,) {h,x)^(/f,X) 



Var \9(h,y<i)]< Var [/(/i,x<i)l< ^• 

The above argument holds for every block i £ [T]. Taking average over blocks, we get 

T 



E 

{h,y)^iH,Y) 



1=1 



Var 

{h,y)^{H,Y) 



1 ^ 



y<i 



i=l 



y<i> 



< 



0, 



, and 



i|(//,y<,)={/i,y< 



Finally, we can apply Chebychev's Inequality to random variable (1/T) • g{H,Y^i) to get 
the desired result: with probability 1 — e over (/i, y) ^ {H,Y), 



i=i V 



M 



2M 
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3.3 Statistical Distance to Uniform Distribution 



Let H : [N] — > [M] be a random hash function form a 2-universal family TC. Let X = 
{Xi,...,Xx) be a block iC-source over [N]'^ . In this subsection, we study the statistical 
distance between the distribution of hashed sequence {H,Y) = {H, H{Xi), . . . , H{Xt)) and 
the uniform distribution (H,U^j^]t). Classic results of |CG881 IILL891 IZuc96| show that if 
K > MT^/e^, then (iif, Y) is e-close to uniform. The proof idea is as follows. The Leftover 
Hash Lemma together with Lemma 12.31 tells us that the joint distribution of hash function 
and a hashed value {H,Yi) = (H, H(Xi)) is ^^ M/K-close to uniform ?7[Af] even conditioning 
on the previous blocks X<j. One can then use a hybrid argument to show that the distance 
grows linearly with the number of blocks, so {H, Y) is T ■ \J Mj i^T-close to uniform. Taking 
K > MT'^/e'^ completes the analysis. 

We save a factor of T, and show that in fact, K = MT/e^ is sufficient. (Taking logs yields 
the third entry in Table [H i.e. it suffices to have Renyi entropy k = m + logT + 21og(l/e) per 
block.) Formally, we prove the following theorem. 

Theorem 3.8 Let H : [N] [M] he a random hash function from a 2-universal family TC. 
Let X = (Xi, . . . , Xt) be a block K -source over [N]"^ . For every e > such that K > MT/e^ , 
the hashed sequence {H, Y) = (H, H{Xi), . . . , H{Xt)) is e-close to uniform {H, C/[^,/]t). 

Recall that the previous analysis goes by passing to statistical distance first, and then 
measuring the growth of distance using statistical distance. This incurs a quadratic dependency 
of K on T. Since without further information, the hybrid argument is tight, to save a factor 
of T, we have to measure the increase of distance over blocks in another way, and pass to 
statistical distance only in the end. It turns out that the Hellinger distance (cf., [GS02| ) is a 
good measure for our purposes: 

Definition 3.9 (Hellinger distance) Let X and Y be two random variables over [M]. The 
Hellinger distance between X and Y is 



d{X, Y) ^(VPr[X=.] - ^Vr[Y = ^]) j = " E VPr[X = .] • Pr[y = i]. 

Like statistical distance, Hellinger distance is a distance measure for distributions, and it 
takes value in [0, 1]. The following standard lemma says that the two distance measures are 
closely related. We remark that the lemma is tight in both directions even if Y is the uniform 
distribution. 

Lemma 3.10 (cf., |GS02] ) Let X and Y he two random variables over [M]. We have 

d{X, Yf < A{X, Y)<V2- d{X, Y). 
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In particular, the lemma allows us to upper-bound the statistical distance by upper-bounding 
the Hellinger distance. Since our goal is to bound the distance to uniform, it is convenient to 
introduce the following definition. 

Definition 3.11 (Hellinger Closeness to Uniform) Let X be a random variable over [M]. 
The Hellinger closeness of X to uniform U j is 

C{X) Yl • Pt[X = i] = 1 - diX, U^M]?. 

i 

Note that C(X, Y) = C{X) -CiY) when X and Y are independent random variables, so the 
Hellinger closeness is well-behaved with respect to products (unlike statistical distance). By 
Lemma [3.10( if the Hellinger closeness C{X) is close to 1, then X is close to uniform in statistical 
distance. Recall that collision probability behaves similarly. If the collision probability cp(X) 
is close to l/M, then X is close to uniform. In fact, by the following normalization, we can 
view the collision probability as the 2-norm of X, and the Hellinger closeness as the 1/2-norm 
of X. 

Let f{i) = M ■ Pr[X = i] for i G [M]. In terms of /(•), the collision probability is cp{X) = 
(1/M2) . and Lemma says that if the "2-norm" M ■ cp{X) = Ei[f {if] < 1 + e 

where the expectation is over uniform i G [M], then A(X,U) < y/e,. Similarly, Lemma 13.101 
says that if the "1/2-norm" C{X) = Ei[^/f{^j] > 1 - e, then A{X, U) < 

We now discuss our approach to prove Theorem [331 We want to show that (ff, Y) is close to 
uniform. All we know is that the conditional collision probability cp(li|-ff, !<«) is close to 1/M 
for every block. If all blocks are independent, then the overall collision probability cp(i^, Y) 
is small, and so (i?, Y) is close to uniform. However, this is not true without independence, 
since 2-norm tends to over- weight heavy elements. In contrast, the 1/2-norm does not suffer 
this problem. Therefore, our approach is to show that small conditional collision probability 
implies large Hellinger closeness. Formally, we have the following lemma. 

Lemma 3.12 Let X = (Xi, . . . , Xj^) be jointly distributed random variables over [Mi] x • • • x 
[Mt] such that cp(Xj|X<j) < ai/Mi for every i £ [T]. Then the Hellinger closeness satisfies 

C(X) > J ^ . 

V «! . . . ax 

With this lemma, the proof of Theorem 13.81 is immediate. 

Proof of Theorem [33 By Lemma [331 cp{H) = 1/|W|, and cp{Yi\H,Y<i) < {1 + M/K)/M 
for every i S [T]. By Lemma [3.12l the Hellinger closeness satisfies C{H,Y) > {\+M / K)~'^/'^ > 
1 - MT/2K (recah that K > MT/e^). It follows by Lemma [3l0] that 

A{{H, Y), {H, U^m]t)) < V2 ■ d{{H,Y), {H, U[m]t)) = V2 ■ ^l-C{H,Y) < ^MT/K < e. 
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We proceed to prove Lemma [3.12[ The main idea is to use Holder's inequality to relate two 
different norms. We recall Holder's inequality first. 

Lemma 3.13 (Holder's inequality [Dur04] ) 

• Let F, G he two non-negative functions from [M] to M, and p,q > satisfying 1/p+l/q = 
1 . Let X be a uniformly random index over [M] . We have 

E[F{x) ■ G{x)] < E[F{x)P]'/P ■ E[G(x)«]i/5. 

• In general, let Fi, . . . ,Fn be non-negative functions from [M] to M, and pi, . . .pn > 
satisfying l/pi + . . . 1/pn = 1- We have 

E[Fi{x) ■ --Fnix)] < E[Fi(xr]i/Pi • • •E[F„(x)P"]i/P". 

X XX 

Proof of Lemma I3.12t We prove it by induction on T. The base case T = 1 is already 
non-trivial. Let X be a random variable over [M] with cp(X) < a/M, we need to show that 
the Hellinger closeness C{X) > y^l/a. Recall the normalization we mentioned before. Let 
f{x) = M ■ Pv[X = x] for every x G [M]. In terms of /(•), we want to show that Ea;[/(2;)^] < a 
implies Ex[^/f{x)] > ^/ija. Note that Ex[f{x)] = 1. We now apply Holder's inequality with 
F = /2/3, G = p = 3, and g = 3/2. We have 

E[/(x)]<E[/(x)2]V3.E[/(x)V2]2/3^ 

which implies 

c{x) = E[v7M] > e[/(x)]3/Ve[/(x)Y/' > VT/^- 

X XX 

Suppose the lemma is true for T — 1, we show that it is true for T. Let f{x) = Mi ■ 
Pr[Xi = x]. To apply the induction hypothesis, we consider the conditional random variables 
{X2, . . . , Xt\xi=x) for every x S [Mi]. For every x S [Mi] and j = 2,...,T, we define 
gj{x) = Mj ■ cp{{Xj\xi=x)\{X2, ■ ■ ■ , Xj-i\xi=x)) to be the "normalized" conditional collision 
probability. By induction hypothesis, we have C{X2, ■ ■ ■ , Xt\xi=x) > \/ g2{x) ■ ■ ■ gxix) for 
every x £ [Mi]. It follows that 

C(X) = E[v7M • CXX2, . . . , Xt\x,=x)] > E[Vfix)/g2ix)...gTix)]. 

X X 

We use Holder's inequality twice to show that Ex[\/ f (x) / g2ix) ■ ■ ■ grix)] > y^l/ai • • • ar- 
Let us first summarize the constraints we have. By definition, we have Ea;[/(2;)^] !^ cti- Fix 
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j e{2,...,T}. Note that 

cp{Xj\X<j) 

= E [cp{{Xj\x,=x)\{X2,...,Xj_i\x,=x))] 

= E [gj{x)/Mj] 

x*—Xi 

= E [f{x)9j{x)]/Mj 

It follows that Ex[fix)gj{x)] < aj for j = 2, ...,T. Now, we apply the second version of 
Holder's Inequality with Fi = {f/g2--- gr)^/^ , Fj = {fgjY'^'^^^^ for j = 2, . . . , T, pi = 2/(T + 
1), and pj = 1/ {T + 1) for j = 2, . . . , T, which gives 



E 



- 2/(T+l) 

/(xfA^+i) <E ^f{x)/g2{x)---gT{x) -E [/(x)52(:r)]i/(^+^) • • • E [/(x)5r(x)]^/(^+^) , 



so 



E 

X 



^f{x)/g2{x)---gT{x)\ > E [/(xf ■ J] 5 [/(^)5.(^r'^' 



(T+l)/2 



i=2 



> E 



(r+i)/2 



• \/l/a2 • • • ax- 



It remains to lower bound the first term by ^\ja\. We apply Holder again withF = /2/(r+2), 
Q ^ jT/(T+2)^ _p = r + 2, and g = (T + 2)/{T + 1), which gives 



E[/(x)] <e[/(x) 



21 l/(T+2) 



(T+l)/(T+2) 



SO 



E 

X 



Combining the inequalities, we have C(X) > ^J\|a\■ ■ ■ ar- 



4 Negative Results: How Much Entropy is Necessary? 

In this section, we provide lower bounds on the entropy needed for the data items. We show 

that if K is not large enough, then for every hash family 7i, there exists a block X-source 
X = {Xi, . . . , Xt) such that the hashed sequence Y = (H(Xi), . . . , H{Xt)) do not satisfy the 
desired closeness requirements to uniform (possibly in conjunction with the hash function H). 
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4.1 Lower Bound for Statistical Distance to Uniform Distribution 

Let us first consider the requirement for the joint distribution of {H, Y) being e-close to uniform. 
When there is only one block, this is exactly the requirement for a "strong extractor". The 
lower bound in the extractor literature, due to Radhakrishnan and Ta-Shma [RTOOj shows that 
K > Q.{M/e^) is necessary, which is tight up to a constant factor. Our goal is to show that 
when hashing T blocks, the value of K required for each block increases by a factor of T. 
Intuitively, each block will produce some error (i.e., the hashed value is not close to uniform), 
and the overall error will accumulate over the blocks, so we need to inject more randomness 
per block to reduce the error. Indeed, we use this intuition to show that K > Q[MT / e"^) 
is necessary for the hashed sequence to be e-close to uniform, matching the upper bound in 
Theorem 13.81 Note that the lower bound holds even for a truly random hash family. Formally, 
we prove the following theorem. 

Theorem 4.1 Let N,M, and T be positive integers and e € (0, eo) o, real number such that 
N > MT/e"^, where Eq > is a small absolute constant. Let H : [N] [M] be a random hash 
function from an hash family Ti. Then there exists an integer K = Q,{MT/s'^), and a block 
K -source X = {Xi, . . . , Xt) such that {H, Y) = (H, H{Xi), . . . , H{Xt)) is e-far from uniform 
{H,U[]^.j]t) in statistical distance. 

To prove the theorem, we need to find such an X for every hash family TC. Following the 
intuition, we find an X that incurs certain error on a single block, and take X = {Xi, . . . , Xt) 
to be T i.i.d. copies of X. More precisely, we first find a X-source X such that for r2(l)-fraction 
of hash functions h Ti, h{X) is D,{e/VT)-fai from uniform. This step is the same as the lower 
bound proof for extractors [RTOO] . which uses the probabilistic method. We pick X to be a 
random flat i^-source, i.e., a uniform distribution over a random set of size K, and show that 
X satisfies the desired property with nonzero probability. The next step is to measure how the 
error accumulates over independent blocks. Note that for a fixed hash function h, the hashed 
sequence {h{Xi), . . . , h{XT)) consists of T i.i.d. copies of h{X). Reyzin |Rey04 ] has shown that 
the statistical distance increases \/T when we have T independent copies for small T. However, 
Reyzin's result only shows an increase up to distance 0((5^/^), where 6 is the statistical distance 
of the original random variables. We improve Reyzin's result to show that the r2(\/T) growth 
continues until the distance reaches some absolute constant. We then use it to show that the 
joint distribution (H, Y) is far from uniform. 

The following lemma corresponds to the first step. 

Lemma 4.2 Let N and M be positive integers and e G (0, 1/4), 6 £ (0, 1) real numbers such 
that N > M/e^ . Let H : [N] [M] be a random hash function from an hash family Ti. 
Then there exists an integer K = Q.{5'^M /e^), and a fiat K -source X over \N\ such that with 
probability at least 1 — 8 over h <^ H, h(X) is e-far from uniform. 
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Proof. Let K = [mm{a-M/e^ A^/2}J for some a to be determined later. Let X be a random 
flat X-source over [N]. That is, X = Us where S C [N] is a uniformly random size K subset 
of [A'"]. We claim that for every hash function h : [N] [M], 



Pr[ h{Us) is e-far from uniform ] > 1 — c • -y/a 
s 



(3) 



for some absolute constant c. Let us assume ([3]), and prove the lemma first. Since the claim 
holds for every hash function h, 

Pr [ h(Us) is e-far from uniform 1 > 1 — c • \/a. 

h^H,S 

Thus, there exists a flat i^-source Us such that 

Pr [ h{Us) is e-far from uniform ] > 1 — c • \/a. 

h< — H 

The lemma follows by setting a = min{5^/c^, 1/32}. We proceed to prove ([3]). It suffices to 
show that for every y G [M], with probability at least 1 — c' • ^/a over random Us, the deviation 
of Pi[h{Us) = y] from 1/M is at least 4e/M, where c' is another absolute constant. That is. 



Pr 

s 



FT[hiUs) = y] 



1 

M 



> 



4e 
M 



> 1 



a. 



(4) 



Again, let us see why (j4]) is sufficient to prove ([3]) first. Let us call y £ [M] is bad for S if 



Pr:[h{Us) = y] 



M 



> 



4e 
M' 



Since Inequality @ holds for every y € [M], we have 

Pr[y is bad for S] > 1 — c ■ y/a, 

S,y 

where y is uniformly random over [M] . It follows that 

Pr[at least 1/2-fraction of y are bad for S] > 1 — 2c • \fa 

Observe that if at least 1/2-fraction of y are bad for 5, then A(/i(X), ?7[jv/]) > e. Inequality ([3|) 
follows by setting c = 2c'. 

It remains to prove (gl). Let T = h-^{y). We have Vis[h{Us) = y] = \Sr^T\/\S\. Thus, 
recall that K < aM/e^, dH) follows from inequality 



Pr 

s 



\Sr\T\ 



K 
M 



< 



4Ke 



< c'- 



which follows by the claim below by setting L = K/M, and (3 = Ae^/ K/M (Working out the 
parameters, we have c' = Ac", e < 1/4 implies /3 < VT^, and a < 1/32 implies /3 < 1.) 
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Claim 4.3 Let N,K > 1 be positive integers such that N > 2K, and L £ [0,K/2\, j3 € 
(0, min{l, \/L}) real numbers. Let S C [N] be a random subset of size K, and T C [N] be a 
fixed subset of arbitrary size. We have 



Pr 

s 



\SnT\ -L\ <(3Vl\ <c" -(3 



for some absolute constant c" . 



Intuitively, the probability in the claim is maximized when the set T has size NL/K so that 
L = E5[|S' n r|], and the claim follows by observing that in this case, the distribution has 
deviation 0(\/^), and each possible outcome has probability 0{yj\/ L). The formal proof of 
the claim is in Appendix |A] and is proved by expressing the probability in terms of binomial 
coefficients, and estimating them using Stirling formula. H 
The next step is to measure the increase of statistical distance over independent random 
variables. 



Lemma 4.4 Let X and Y be random variables over [M] such that A{X,Y) > e. Let X = 
{Xi, . . . ,Xt) be T i.i.d. copies of X, and let Y = (li, . . . ,1t) be T i.i.d. copies of Y. We 
have 

A(X, Y) > min{eo, cVt ■ e}, 
where £o,c are absolute constants. 

We defer the proof of the above lemma to Appendix [Bl 



Proof of Theorem I4.lt The absolute constant eo in the theorem is a half of the eo in 
Lemma [4.4l By Lemma [4. 21 there is a flat ivT-source such that for 1/2- fraction of hash functions 
hen, h{X) is (2e/c\/r)-far from uniform, for K = Q.{{l/2fM/{2e/cVTf) = n{MT/e'^). 
We set X = {Xi, . . . , Xt) to be T independent copies of X. Consider a hash function h such 
that h{X) is (2e/c\/T')-far from uniform. By Lemma 14.41 {h{Xi), . . . ,h[XT)) is 2e-far from 
uniform. Note that this holds for 1/2- fraction of hash function h. It follows that 



A((//,Y),(F,C/[M]))= E 



.((M^i),...,/i(Xt),?7, 



[M\i 



4.2 Lower Bound for Small Collision Probability 

In this subsection, we prove lower bounds on the entropy needed per item to ensure that the 
sequence of hashed values is close to having small collision probability. Since this requirement is 
less stringent than being close to uniform, less entropy is needed from the source. The interest- 
ing setting in applications is to require the hashed sequence {H, Y) = {H, H{Xi), . . . , H{Xt)) 
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to be e-close to having collision probability 0(1/(17^1 -M^)). Recall that in this setting, instead 
of requiring K > MT/e^, K > Q,(MT/e) is sufficient for 2-universal hash functions (Theorem 
I3.ip . and K > Vl{MT + TyM/e) is sufficient for 4- wise independent hash functions (Theorem 
13. 5|) . The main improvement from 2-universal to 4-wise independent hashing is the better de- 
pendency on £. Indeed, it can be shown that if we use truly random hash functions, we can 
reduce the dependency on e to log(l/e). Since we are now proving lower bounds for arbitrary 
hash families, we focus on the dependency on M and T. Specifically, our goal is to show that 
K = Q.{MT) is necessary. More precisely, we show that when K <C MT, it is possible for the 
hashed sequence [H, Y) to be .99-far from any distribution that has collision probability less 
than 100/(|W| ■M'^). 

We use the same strategy as in the previous subsection to prove this lower bound. Fixing a 
hash family Ti, we take T independent copies {Xi, . . . , Xt) of the worst-case X found in Lemma 
14.21 and show that {H, H{Xi), . . . ,H{Xt)) is far from having small collision probability. The 
new ingredient is to show that when we have T independent copies, and K <C MT, then 
(/i(Xi), . . . , h{X'r)) is very far from uniform (say, 0. 99-far) for many h G Tl. We then argue 
that in this case, we can not reduce the collision probability of {h{Xi), . . . , h{XT)) by changing 
a small fraction of distribution, which implies the overall distribution (H, Y) is far from any 
distribution {H' , Z) with small collision probability. Formally, we prove the following theorem. 

Theorem 4.5 Let N, M, and T be positive integers such that N > MT. Let 6 G (0, 1) and 
a > 1 be real numbers such that a < 6^ ■ e'^^^'^ /128. Let H : [A^] [M] be a random hash 
function from a hash family TL. There exists an integer K = 0(5^ MT/ log(a/(5)), and a block 
K-source X = (Xi, . . . , Xt) such that {H, Y) = {H, H{Xi), H{Xt)) is (1 - 5)-far from 
any distribution {H',Z) with cp{H',Z) < a/{\n\ ■ M^). 

Think of a and 5 as constants. Then the theorem says that K = Q(MT) is necessary for 
the hashed sequence {H, H{Xi), . . . , H{Xt)) to be close to having small collision probability, 
matching the upper bound in Theorem 13.11 In the previous proof, we used Lemma 14.41 to 
measure the increase of distance over blocks. However, the lemma can only measure the 
progress up to some small constant. It is known that if the number of copies T is larger then 
0(l/e^), where e is the statistical distance of original copy, then the statistical distance goes 
to 1 exponentially fast. Formally, we use the following lemma. 

Lemma 4.6 ( |SV99j ) Let X and Y be random variables over [M] such that A{X,Y) > e. 
Let X = (Xi, . . . , Xt) he T i.i.d. copies of X, and let Y = (li, . . . , Yt) be T i.i.d. copies of 
Y . We have 

A(X,Y)>l-e-^='/2^ 

We remark that Lemma 14.41 and 14.61 are incomparable. In the parameter range of Lemma 
14.41 Lemma 14.61 only gives A(X, Y) > rj(re2) instead of VL{^/Te). To argue that the overall 
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distribution is far from having small collision probability, we introduce the following notion of 
nonuniformity. 



Definition 4.7 Let X he a random variable over [M] with probability mass function p. X is 
((5, /3)-nonuniform if for every function q : [M] R such that < q{x) < p{x) for all x £ [M], 
and X^x' — ^' function satisfies 



Intuitively, a distribution X over [M] is (5, /3)-nonuniform means that even if we remove 
(1 — 5)-fraction of probability mass from X, the "collision probability" remains greater than 
f3/M. In particular, X is (1 — (5)-far from any random variable Y with cp(y) < (3/M. 

Lemma 4.8 Let X be a random variable over [M]. If X is {\ — rf) -far from uniform, then X 
is {2yJ (3 ■ r], j3) -nonuniform for every /? > 1. 

Proof. Let p be the probability mass function of X, and q : [M] ^ M be a function such 



that < q{x) < p{x) for every x £ [M], and Ylxli-^) — 2\/ /3 • r]. Our goal is to show that 
Qixf > P/M. Let T = {x£[M]: p{x) > 1/M}. Note that 

A{X, [/[M]) = Pr[X G T] - Pr[[/[M] G T] > 1 - r?. 

This implies Pi[X G T] > 1 - and /x(r) = Pr[[/[j\/] € T] < r?. Now, 



We are ready to prove Theorem 14. 5i 

Proof of Theorem [43} By Lemma with e = y'2ln{128a/6^)/T < 1/4, there is a flat 
X-source X such that for (1 — 5/4)-fraction of hash function h gTC, h{X) is e-far from uniform, 
for K = ^{{5/4.fM/£^) = n{6^MT/ \og{a/S)). We set X = (Xi, . . . , Xt) to be T independent 
copies of X. Consider a hash function h such that h{X) is e-far from uniform. By Lemma 
14.61 {h{Xi), . . . ,h(XT)) is (1 — ?7)-far from uniform, for r] = e~^ "^^^ = 5^ /128a. By Lemma 
14.81 ),..., /i(Xj')) is ((5/4, 2a/(^)-nonuniform for (1 — (5/4)-fraction of hash functions h. 

By the first statement of Lemma 14.91 below, this implies that {H, Y) is (1 — 5)-iai from any 
distribution {H',Zi) with collision probability a/{\7i\ ■ M'^). M 



xelM] 



qix) > 2y^ - Pr[X ^ T] > 2^ 



and ^(T) < rj implies 
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Lemma 4.9 Let (H,Y) be a joint distribution overTCx [M] such that the marginal distribution 
H is uniform over 7i. Let e, 6, a be positive real numbers. 

1. IfY\H=h is {6/ 4:, 2a/ 6) -nonuniform for at least (1 — 6 /A) -fraction of h £TC, then {H,Y) 
is (1 — d)-far from any distribution (H', Z) with cp{H' , Z) < a/{\TC\ ■ M). 

2. IfY\H=fi is (0. 1, 2a /e) -nonuniform for at least 2£- fraction- f ration of h £7i, then {H,Y) 
is £ -far from any distribution {H' , Z) with cp{H',Z) < a/{\7i\ ■ M). 

Proof. We introduce the following notations first. For every h £ 7i, we define qh ■ [M] M 
by 

qh{y)=mm{FT[{H,Y) = {h,y)],Fr[{H' , Z) = {h,y)]} 
for every y G [M] . We also define / : ^ M by 

fih) = Yl ihiy) < = ^] = ^• 

ye[M] ' ' 

For the first statement, let {H' , Z) be a random variable over 7i x [M] that is (1 — 6)- 
close to {H,Y). We need to show that cp{H' , Z) > a/{\n\ ■ M). Note that J2hf(^) = 
1 — A{{H,Y), (H' , Z)) > 6. So there are at least a (3(5/4)-fraction of hash functions h with 
f{h) > {6/4:)/\n\. At least a (3-5/4) - {6/4) = V2-fraction of h satisfy both f{h) > (5/4)/|H| 
and Y\H=h is ((5/4, 2a/(5)-nonuniform. By the definition of nonuniformity, for each such h, we 
have 

Therefore, 

cp(i/', z) > ^ ,„(y? > (5 ■ l«l) ■ = 

h,y 

Similarly, for the second statement, let {H' , Z) be a random variable over 7i x [M] that 
is e-close to {H,Y). We need to show that cp{H',Z) > a/{\n\ ■ M). Note that Y^hfi^) = 
1-A{{H,Y),{H',Z)) > 1-e. So there are at least a l-e/0.9-fraction of /i with /(/i) > 0.1/|W|. 
At least a 2e — e/0.9 > e/2-fraction of hash functions satisfy both f{h) > 0.1/\7i\ and Y\H=h 
is (0.1, 2a/e)-nonuniform. By Lemma 14.81 for each such h, we have 

yam 

Therefore, 

' e ,\ 2a a 

qh[yr > [ 

h,y 



cp{H',Z) > y qhiy? >{--\n\]- , " = 
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4.3 Lower Bounds for the Distribution of Hashed Values Only 

We can extend our lower bounds to the distribution of hashed sequence Y = {H{Xi), . . . , H{Xt)) 
along (without H) for both closeness requirements, at the price of losing the dependency on e 
and incurring some dependency on the size of the hash family. Let 2'^ = 17^1 be the size of the 
hash family. The dependency on d is necessary. Intuitively, the hashed sequence Y contains 
at most T ■ m bits of entropy, and the input {H, Xi, . . . , Xt) contains at least d + T ■ k bits 
of entropy. When d is large enough, it is possible that all the randomness of hashed sequence 
comes from the randomness of the hash family. Indeed, if H is T-wise independent (which is 
possible with d ~ T-m), then (H(Xi), . . . , H{Xt)) is uniform when Xi, . . . , Xt are all distinct. 
Therefore, 

A{{H{Xi), H{Xt)), U^m]t) < Pr[ not all Xi, . . . , Xt are distinct ] 

Thus, K = r2(T^) (independent of M) suffices to make the hashed value close to uniform. 

Theorem 4.10 Let N,M,T be positive integers, and d a positive real number such that N > 
MT/d. Let 5 £ (0,1), a > 1 be real numbers such that a ■ 2'^ < 5^ ■ 6^/^^/128. Let H : 
[N] — > [M] be a random hash function from an hash family 7i of size at most 2'^. There 
exists an integer K = ^1,{6'^ MT / d ■ log{a/5)), and a block K-source X = {Xi, . . . ^Xt) such 
that Y = [H [Xi) ^ . . . ^ H {Xt)) is (1 — 6) -far from any distribution Z = (Zi,...,^^) with 
cp(Z) < a/M"^ . In particular, Y is (1 — 5) -far from uniform. 

Think of a and 6 as constants. Then the theorem says that when the hash function contains 
d ^ T/{32 In 2) — 0(1) bits of randomness, K = 0,{AIT/d) is necessary for the hashed sequence 
to be close to uniform. For example, in some typical hash applications, N = poly(M) and the 
hash function is 2-universal or 0(l)-wise independent. In this case, d = O(logM) and we need 
K = n{MT/ log M). (Recall that our upper bound in Theorem O says that K = 0{MT) 
suffices.) 

Proof. We will deduce the theorem from Theorem 14.51 Replacing the parameter a by a • 2"^ 
in Theorem 14.51 we know that there exists an integer K = n{6^MT/d ■ log{a/6)) and a block 
i^-source X = {Xi, ...,Xt) such that {H,Y) = {H,H{Xi), . . .,H{Xt)) is (1 - 5)-far from 
any distribution {H' ,Z) with cp{H',Z) < a ■ 2'^/(2'^ • M'^) = a/M'^. Now, suppose we are 
given a random variable Z on [M]"^ with A(Y, Z) < 1 — (5. Then we can define an !{' such 
that A((ff, Y), (if', Z)) = A(Y, Z) (Indeed, define the conditional distribution -fr'|z=z to equal 
H\y=x for every z G [M]"^.) Then we have 

cp(Z)>cp(ii',Z)>^. 

■ 

One limitation of the above lower bound is that it only works when d < T/(32 In 2) — 0(1). 
For example, the lower bound cannot be applied when the hash function is T-wise independent. 
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Although d = ^{T) may not be interesting in practice, for the sake of completeness, we provide 
another simple lower bound to cover this parameter region. 

Theorem 4.11 Let N, M, T be positive integers, and 6 £ (0, 1), a > 1, d > real numbers. Let 
H : [N] [M] be a random hash function from an hash family 7i of size at most 2'^ . Suppose 
K < N be an integer such that K < ((5^ /4a • 2^^)^/^ • M . Then there exists a block K -source 
X = (Xi, . . . , Xt) such that Y = (H{Xi), . . . , H{X't)) is (1 — 6) -far from any distribution 
7i = {Zi, . . . , Zt) with cp(Z) < a/M'^ . In particular, Y is (1 — 6)-far from uniform. 

Again, think of a and 6 as constants. The theorem says that K = n{M/2<^/^) IS necessary 
for the hashed sequence to be close to uniform. In particular, when d = @{T), K = Vt{M) is 
necessary. Theorem 14. 101 gives the same conclusion, but only works for d < T/(321n2) — 0(1). 
On the other hand, when d = o{T), Theorem 14. 101 gives better lower bound K = VL[MT/d). 
Proof. Let X be any flat if-source, i.e., a uniform distribution over a set of size K. We 
simply take X = (Xi, . . . , Xt) to be T independent copies of X. Note that Y has support at 
most as large as (if, X). Thus, 

|supp(Y)| < |supp(F,X)| =2'^-K^ <—■ M^. 

4a 

Therefore, Y is (1 — (5^/4a)-far from uniform. By Lemma 14.81 Y is (1 — (5)-far from any 
distribution Z = (Zi, . . . , Zt) with cp(Z) < a/M^. ■ 

4.4 Lower Bound for 2-universal Hash Functions 

In this subsection, we show Theorem 13.11 is almost tight in the following sense. We show that 
there exists K = ^{MT/e ■ log(l/e)), a 2-universal hash family TC, and a block iiT-source X 
such that {H,Y) is e-far from having collision probability 100/(|'H| • M'^). The improvement 
over Theorem 14.51 is the almost tight dependency on e. Recall that Theorem 13.11 savs that for 
2-universal hash family, K = 0{MT/e) suffices. The upper and lower bound differs by a factor 
of log(l/e). In particular, our result for 4- wise independent hash functions (Theorem 13. 5 p 
cannot be achieved with 2-universal hash functions. The lower bound can further be extended 
to the distribution of hashed sequence Y = (H{Xi), . . . , H{Xt)) as in the previous subsection. 
Furthermore, since the 2-universal hash family we use has small size, we only pay a factor of 
O(logM) in the lower bound on K. Formally we prove the following results. 

Theorem 4.12 For every prime power M, real numbers £ G (0,1/4) and a > 1, the fol- 
lowing holds. For all integers t and N such that e ■ M*^^ > 1 and N > 6eM^*, and for 

there exists an integer K = Q(MT/e ■ log(a/e)), and a 2-universal 

^For technical reason, our lower bound proof does not work for every sufficiently large T. Ifowever, note that 
the density of T such that the lower bound holds is 
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hash family 7i from [N] to [M], and a block K-source X = {Xi, . . . ,Xt) such that (i/, Y) = 
{H,H{Xi), . . .,H{Xt)) is e-far from any distribution {H',Z) with cp{H',Z) < a/{\n\ ■ M^). 

Theorem 4.13 For every prime power M, real numbers e G (0,1/4) and a > 1, the fol- 
lowing holds. For all integers t and N such that e ■ M*^^ > 1 and N > 6eM^*, and for 
T = \e'^M'^^-^\og{aM/£)], there exists an integer K = ^{MT/e ■ \og{aM/e)), and a 2- 
universal hash family TC from [N] to [M], and a block K-source X = {Xi, . . . ,Xt) such that 
Y = {H{Xi), . . . , H{Xt)) is e-far from any distribution Z with cp(Z) < ajM^ . 

Basically, the idea is to show that the Markov Inequality applied in the proof of Theo- 
rem 13.11 (see Inequality ([I|))is tight for a single block. More precisely, we show that there 
exists a 2-universal hash family 7^, and a X-source X such that with probability e over 
h ^ H, cp(/i(X)) > 1/M + n{l/Ke). Intuitively, if we take T = Q{Ke ■ log(a/e)/M) in- 
dependent copies of such X, then the collision probability will satisfy cp(/i(Xi), . . . , h{Xj')) > 
(1 + Q.{M/Ke)Y /M'^ > a/(eM^), and so the overall collision probability is cp(F,Y) > 
a/{\TC\ ■ M'^). Formally, we analyze our construction below using Hellinger distance, and show 
that the collision probability remains high even after modifying a 0(e)-fraction of distribution. 

Proof of Theorem I4.12t Fix a prime power M, and e > 0, we identify [M] with the finite 
field F of size M. Let t be an integer parameter such that M*~^ > 1/e. Recall that the set TIq 
of linear functions {kg : F* F}^g]ft where hg{x) = J^i^^i^i is 2-universal. Note that picking 
a random hash function h <— TCq is equivalent to picking a random vector a <— F*. Two special 
properties of Tio are (i) when a = 0, the whole domain F* is sent to G F, and (ii) the size 
of hash family {Ti-ol the same as the size of the domain, namely |F*|. We will use TCq as a 
building block in our construction. 

We proceed to construct the hash family 7i. We partition the domain [A^] into several sub- 
domains, and apply different hash function to each sub-domain. Let s be an integer parameter 
to be determined later. We require N > s ■ M*, and partition [A^] into Dq, Di, . . . , Ds, where 
each of Di, . . . , Dg has size M* and is identified with F*, and Dq is the remaining part of [A^]. 
In our construction, the data X will never come from Dq. Thus, wlog, we can assume Dq is 
empty. For every i = 1, . . . , s, we use a linear hash function h^. £ TCq to send Di to F. Thus, 
a hash function h £ TC consists of s linear hash function (/i^^, . . . , hg^), and can be described 
by s vectors ai, . . . ,as G F*. Note that to make Ti. 2-universal, it suffices to pick oi, . . . ,0^ 
pairwise independently. Specifically, we identify F* with the finite field F of size M*, and pick 
(oi, . . . , Os) by picking a, 6 G F, and output {a + ai ■ b,a-\- a2 • b, . . . ,a + • b) for some distinct 
constants qi, . . . , G F. Formally, we define the hash family to be 

n = {h'',b . ^ [M]}^ where h'^^' = (ha+a.b, • • • , ha+a.b) = (hf, hf). 

It is easy to verify that TC is indeed 2-universal, and = M^*. 
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We next define a single block i^T-source X that makes the Markov Inequality ([T]) tight. We 
simply take X to be a uniform distribution over Di U • • • U D^, and so K = s ■ M*. Consider 
a hash function h""'^ G H. If all /i^'^ are non-zero and distinct, then h°''^{X) is the uniform 
distribution. If exactly one h^'^ = 0, then h""'^ sends M* + (s — 1)M*~^ elements in [N\ to 0, 
and (s — 1)M*~^ elements to each nonzero y G F. Let us call such h""'^ had hash functions. 
Thus, if h""'^ is bad, then 



Note that /i"'^ is bad with probability 



K ' ' \ K 



1 M-1 1 1 



Pr[exactly one hf = 0] = Pr[6 / A 3i (a + a^fe = 0)] = ( 1 - ) • -- > 



1 \ s 



We set s = [4eM*] < M*. It follows that with probability at least 2e over /i <— "H, the collision 
probability satisfies cp(/i(X)) > 1/M + l/(4i^re), as we intuitively desired. However, instead 
of working with collision probability directly, we need to use Hellinger closeness to measure 
the growth of distance to uniform (see Definition 13. 9i ) The following claim upper bounds the 
Hellinger closeness of h{X) for bad hash functions h. The proof of the claim is deferred to the 
end of this section. 



Claim 4.14 Suppose h is a had hash function defined as above, then the Hellinger closeness 
ofh{X) satisfies C{h{X)) < 1 - M/{64Ke). 

Finally, for every integer T G [e^M^*"^ log(a/e), cq • £'^M'^^^^log{a/e)], we can write T = 
c ■ (GAKe/M) ■ ln(800a/e) for some constant c < cq. Let X = {Xi, . . . ,Xt) be T independent 
copies of X. We now show that K,TC,J^. satisfy the conclusion of the theorem. That is, K = 
0(MT/(e log(a/e))) (as follows from the definition of T) and {H,Y) = {H, H{Xi), H{Xt)) 
is e-far from any distribution {H',Z) with cp{H',Z) < a/{\7{\ • M^). 

Consider the distribution (/i(Xi ),..., /i(Xt)) for a bad hash function h £ TC. From the 
above claim, the Hellinger closeness satisfies 

C{h{X,), hiXr)) = C{h{X)f < (1 - M/6AKef < e^^l^^^- < 

By Lemma |3 . 1 1 and the definition of Hellinger closeness, we have 

A((/i(Xi), . . . , /i(Xt)), \J^u\t) > 1 - C{h{X^), . . . , h{XT)) > 1 - 



29 



By Lemma 14. 8| {h{Xi) , . . . , h{XT)) is (0.1, 2Q/e)-nonuniform for at least 2e-fraction of bad 
hash functions h. By the second statement of Lemma [4. 91 this imphes (i?, Y) is e-far from any 
distribution {H',Z) with cp(F',Z) < a/{\n\ ■ M'^). 

In sum, given M, e, a, t that satisfies the premise of the theorem, we set K = [4eM*] • M*, 
and proved that for every N > K, and T = Q({K£/M) •ln(a/e)), the conclusion of the theorem 
is true. It remains to prove Claim [¥.14[ 

Proof of claim: Let p{x) = M ■ Pr[/i(X) = x] for every x S F. For a bad hash 
function h, we have p(0) = (1 + (M — l)/s), and p{x) = (1 — 1/s) for every x / 0. 
We will upper bound C{h{X)) = (1/M) • using Taylor series. Recall that 

for z > 0, there exists some z' , z" £ [0, z] such that 

= ^ + i + T • (" 4(1 + zO^/O - ' + f " 8(1 +1)3/^ ' 

/ 1 

Vl - ^ = 1 - z , < 1 - -. 

- 2 



We have 



M 

X 



I ( M-l {M-lf (M \ ( ^ 

- Uy^^s 8s2 . (1 + (M - l)/s)3/2 + ^ " " 2i 

1 (M-l)2 



8Ms2(l + (M - l)/s)3/2 
Recah that M > 2, s = eM* > M, and ^2 = Ke, we have 

M2 

C{h{X)) < 1 



64Ke 



□ 



Recall that 1^1 = M2*. Theorem 14.131 follows from Theorem 14.121 by exactly the same 
argument as in the proof of Theorem 14.101 
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A Technical Lemma on Binomial Coefficients 



Lemma A.l (Claim |4.3|, restated) Let N,K > I be integers such that N > 2K , and L G 
[0, -ftr/2], [5 S (0, min{l, \/L}) real numbers. Let S C [N\ be a random subset of size K, and 
T C \N] be a fixed subset of [N] of arbitrary size. We have 
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|5nr| -L| < (3^/l] < 0{(3). 



Proof. By an abuse of notation, we use T to denote the size of set T. The probability can 
be expressed as a sum of binomial coefficients as follows. 
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Note that there are at most [2/3\/Zj + 1 terms, it suffices to show that for every R E 
L - (3^/L, L + I3y/L\ , 



f{T) 



(T\ (N-T\ 
def \r) KK-R) 



< o 



We use the following bound on binomial coefficients, which can be derived from Stirling's 
formula. 
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Claim A. 2 For integers 0<i<a, 0<j<b, we have 



C) / ^ I a-b-{i + j)-{a + b-i- j) 



< o 



" \p-ia-i)-j-ib-j)-{a + b) J ■ 
Note that L G [0, K/2] implies K-R = n{K). When 2R < T < N - 2K + 2R, we have 
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as desired. Note that when N > 2K, such T exists. Finally, observe that 0^ < L implies 
i? > 1, and 

f{T) _ {T - R+1){N -T) 
f{T+l) ~ {T + 1){N -T - K + R)' 

It follows that f{T) is increasing when T < 2R, and f{T) is decreasing when T > N — 2K + 2R. 
Therefore, /(T) < f{2R) = 0{^/l/L) for T < 2R, and /(T) < f{N - 2K + 2R) = 0{^l/L) 
for T > N — 2K + 2R, which complete the proof. H 



B Proof of Lemma 14.41 

Lemma B.l (Lemma 14. 4|, restated) Let X and Y be random variables over [M] such that 
A(X,y) > e. Let X = (Xi,...,Xr) be T i.i.d. copies of X, and let Y = {Yi,...,Yt) be T 
i.i.d. copies ofY. We have 

A(X, Y) > min{eo, cVT ■ e], 
where Eq^c are absolute constants. 

Proof. We prove the lemma by the following two claims. The first claim reduces the lemma 
to the special case that X is a Bernoulli random variable with bias and y is a uniform 

coin. The second claim proves the special case. 
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Claim B.2 Let X and Y he random variables over [M] such that IS.{X,Y) = e. Then there 
exists a randomized function f : [M] — > {0, 1} such that f(Y) = U^q ly, and A(/(X), f{Y)) > 
e/2. 

Proof of claim: By the definition, there exists a set T C [M] such that 

|Pr[X e T] -Pr[y e r]| =e. 

With out loss of generality, we can assume that Pr[y G T] = p < 1/2 (because we 
can take the complement of T.) Let g : [M] {0, 1} be the indicator function of T, 
so we have Pry [g(y) = 1] = p. For every x G [M], we define f{x) = g{x) V 6, where 
6 is a biased coin with Pr[6 = 0] = 1/(2(1 — p))- The claim follows by observing 
that 

Pr[/(y) = 0] = Pr[5(y) = A 6 = 0] = (1 - p) • 1/(2(1 - p)) = 1/2, 

and 

A(/(X), f{Y)) > A{X, Y) ■ Pr[6 = 0] > e/2. 

□ 

Claim B.3 Let X he a Bernoulli random variable over {0, 1} such that Pr[X = 0] = 1/2 — e. 
Let X = {Xi, . . . ,Xt) he T independent copies of X. Then 

A(X, C/|o^i}t) > min{eo,c\/re}, 

where £o,c are absolute constants independent of e and T. 

Proof of claim: For x G {0, 1}"^, let the weight wt(x) of x to be the number of 
I's in X. Let 

S = |x G {0, 1}^ : wt(x) < ^ - \/r| 

be the subset of {0, 1}^ with small weight (This choice of S is the main source 
of improvement in our proof compared to that of Reyzin |Rey04| , who instead 
considers the set of all x with weight at most T/2.) For every x G S", we have 

Pr[X = X] < ^.(l-e)^/2+vT.(i+^)T/2-VT < _ min -^AU^.^.y = x]. 

By standard results on large deviation, we have 

Pr[[/|o,i}TG5]>J](l). 
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Combining the above two inequalities, we get 

A(X,[/|o,i}t) > Pr[C/|o,i}T G 5] - Pr[X G 5] 

>- -{^'^}--(^) 
= min{ c\/Te, eo} 

for some absolute constants c, which completes the proof. □ 

Note that applying the same randomized function / on two random variables X and Y can- 
not increase the statistical distance. I.e., A{f{X),f(Y)) < A(X,Y). The lemma following 
immediately by the above two claims: 

A(X,Y) > A(((/i(Xi),...,/T(Xr)),((A(yi),...,/T(yT)) 
> mm{eQ, cVTe} 

where /i, . . . , /t are independent copies of randomized function defined in Claim [B21 and Eq, c 
are absolute constants from Claim IB. 31 ■ 
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